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1.  Introduction 


In  this  paper  we  consider  the  problem  of  testing  an  hypothesis  Hq 
versus  an  alternative  for  situations  in  which  a  large  number  of  tests 
are  to  be  performed  and  in  which  occurs  only  rarely.  Such  situations 
arise  in  many  applications.  For  example,  in  search  radar,  the  radar 
examines  a  large  number  of  resolution  cells  on  each  scan  of  a  search  area, 
but  it  is  expected  that  targets  will  be  present  in  only  a  small  fraction 
of  the  resolution  cells.  Similar  situations  arise  in  applications  such 
as  medical  testing  and  quality  control. 

The  tests  we  consider  here  are  described  as  follows:  For  each  test, 
a  sample  is  taken  and  a  standard  procedure  is  applied  to  decide  whether 
or  not  Hq  can  be  accepted.  If  Hq  is  accepted,  then  the  test  ends.  If  Hq 
is  not  accepted,  then  a  second  sample  is  taken  from  the  same  population 
and  a  second  test  is  performed  to  double-check  the  result.  Thus,  Hq  is 
rejected  only  if  it  is  rejected  on  the  basis  of  both  samples.  The 
motivation  for  this  type  of  testing  is  that,  for  small  probability  of  Type 
I  error  on  the  first  check,  the  average  sample  size  under  Hq  will  be  very 
nearly  that  of  the  first  sample,  whereas  the  overall  performance  should  be 
superior  to  a  test  using  only  the  first  sample.  The  trade-off,  of  course, 
is  that  the  average  sample  size  under  H^  may  be  larger  than  that  required 
for  a  comparable  fixed-sample-size  test;  however,  since  H^  is  assumed  to 
occur  only  rarely,  the  overall  average  sample  size  should  be  smaller  than 
that  of  a  comparable  fixed-sample-size  test. 

In  this  paper,  we  consider  two  such  double-check  procedures  —  Procedure 
1  in  which  the  double-check  is  performed  only  on  the  second  sample,  and 
Procedure  2  in  which  the  doifc  le-check  is  performed  on  the  first  and  second 
samples  combined.  To  investigate  the  properties  of  these  tests,  we  consider 


2 


primarily  the  specific  problem  of  location  testing  with  i.i.d.  normal 
errors.  We  compare  the  power  functions  and  average  sample  sizes  of  these 
procedures  to  those  of  fixed-sample-size  tests  of  the  same  significance 
level  and  with  various  sample  sizes.  We  also  consider  the  Pitman 
asymptotic  efficiency  of  these  two  procedures  relative  to  both  fixed-sample- 
size  tests  and  sequential  tests.  Nonparametric  versions  of  the  new 
procedures  are  also  proposed  and  an  analysis  of  their  behavior  is  included. 
Finally,  a  generalization  is  considered  in  which  Hq  is  rejected  only  after 
being  rejected  on  k  samples  where  k  is  a  positive  integer. 


2.  General  Description  and  Error-Probability  Performance 


Suppose  we  have  two  samples  X. 


S  +  i  =  1,2, . . .  ,n^  and  Xj^  ■  9  +6i» 


i  *  n^  +  l,...,n,  where  e^,...,eQ  is  an  i.i.d.  sequence  of  7J( 0,1)  errors  and 
where  9  i  0.  Consider  the  two  tests  for  the  hypothesis  Hq:8  =0  versus  the 


alternative  H^:9  >  0, 


*DC1<S> 


and 


»BC2®  ' 


Q1 

1  n 

1;  if  2  x.  i  Ti  and  2  x.  i  t- 
i=l  1  i^nj+1  1  1 

0;  otherwise 


1  n 

1;  if  2  x.  i  t-.  and  2  x.  i  r' 
i*l  1  i-1  1  2 


(1) 


(2) 


0;  otherwise 


where  9DCl(x)  [resp.  9DC2(2£)3  is  probability  with  which  we  accept  Hj^ 
given  that  (X1,...,Xn)  »  (x^...^)  *  x,  and  r l  and  t2  [resp.  t^]  are 
chosen  to  give  desired  probability  of  Type  I  error.  Note  that  size  a  can 
be  achieved  by  choosing  Tj^  to  yield  size  er*  *  0  on  the  first  check  (i.e., 
T1  = >Jn^  4  *(l-a*)  where  $  denotes  the  unit  normal  distribution  function) 
and  then  choosing  t2  or  t2  to  give  overall  size  or.  For  (1),  the  second 
threshold  is  thus  given  by  t2  *  ,/n-n^  4  ^l-a/or*),  and  for  (2)  the  second 
threshold  is  given  by  =  Vn  b  where  b  is  the  solution  to  the  equation 


ct-a*  *  F(4  1(l-o*),b)  -  4(b), 


(3) 


where  F  denotes  the  joint  unit  normal  distribution  function  with  correlation 
coefficient  (1  +  K)”^  where  K  =  (n-n^)/n^. 

Note  that  the  average  sample  size  under  HQ  of  either  (1)  or  (2)  is 
given  by  (l+cr*K)n^,  and  the  average  sample  size  under  of  either  test 
is  given  by  (l  +  p*K)n.,  where  9*  *  1  -  4(4’L(1  -  a*)  ~Jn7  9).  Also,  the 


power  functions  (i.e.,  power  versus  0)  of  and  «PDC£  are  easily  computed 

using  standard  techniques.  The  behavior  of  <PDC1  and  9DC2  was  compared  to 
that  of  the  fixed-sample-size  test 


9i?oe  (£) 


1;  if  Z  x.  *  4>~(l-a) 

i-1  x 

0;  otherwise 


for  a  variety  of  choices  of  the  test  parameters  a ,  a*,  K,  and  M  and  for 
n  ■  20.  Table  1  contains  data  for  the  particular  cases  (*  =  0.01,  n^  =  8, 
<**=0.5  and  0.2  and  M  =  10,  15,  and  20.  These  values  of  n^  and  a*  are  used 
because  they  appear  to  be  nearly  optimum  for  with  this  choice  of  <* 

and  n,  and  the  behavior  exhibited  in  Table  1  is  typical  of  the  general 


optimum  behavior  of  9qC^  and  9qq2*  Note  from  Table  1  that,  for  the  choice 
a*  *0.5,  9dc1  and  9^2  have  power  functions  very  near  those  of  9FSS  with 
M=15  and  20,  respectively.  In  this  case,  cpDC2  is  clearly  superior  to  the 
comparable  fixed-sample-size  test,  whereas  9DC^  is  not.  Alternately,  for 
the  choice  <**  =  0.2,  <p  ^  has  a  power  function  which  is  greater  than  that  of 
the  M=15  version  of  9  while  requiring  only  10.4  samples  on  the  average 
under  Hq.  Note  that  the  M = 10  version  of  9pSS  is  not  comparable  to  9DCp 
in  this  case.  Note  also  that,  with  a*  =  0-2,  still  compares  favorably 

with  the  M  =  20  version  of  sp  _  in  terms  of  power  while  rataining  the  small 
average  sample  size  under  Hq.  It  is  noteworthy  that  the  average  sample 
sizes  of  the  double-check  tests  can  in  no  case  exceed  n  under  either 


hypothesis. 


Another  comparison  of  interest  is  to  consider  the  Pitman  asymptotic 


efficiencies  (as  defined,  for  example,  in  Noether  (1955))  of  the  tests 
(1)  and  (2)  relative  to  (4).  In  particular  it  is  straightforward  to  show 
that,  for  fixed  a  and  power  0,  under  Hq  we  have 


are?  _  is'V-co-g^a-B)]2 

^C’FSS  (1+0*10  R2 


(5) 


where,  for  the  parameter  R  is  the  solution  to  the  equation 

0  *  <$(R  1  (l-o*) )*CyKR-$>_1(l-cr /at*)), 


(6) 


and,  for  <?DC2>  R  solves 

0  =  <S»(R-S>"1(l-a*))  -$(b  -,/i+KR) +F($"1(l-cr*)  -  R,b  -Ti+KR),  (7) 
where  b  and  F  are  as  in  (3).  Similarly,  under  we  have 


AREDC,FSS 


(l+g*K) 
(1  +  0*K) 


ARE. 


OC,FSS 


(8) 


where  0*  is  as  defined  above. 

Values  of  ARE^  Fgs  for  j  ■  0  and  1  are  given  in  Table  2  for  a  variety 
of  values  of  or*,  K,  a,  and  0.  Note  that  cp  ^  *s  uniformly  superior  to 
cpFSS  under  HQ  for  the  ranges  of  parameters  considered.  Furthermore,  in 
each  case,  values  of  or*  and  K  can  be  chosen  so  that  neariy  38 

efficient  as  9Fgg  under  as  well.  As  expected,  is  slightly  less 

efficient  than  vDC2»  hut  a*  and  K  can  be  chosen  to  yield  efficiency  under 
Hq  higher  than  that  of  9pgg  in  each  case  considered.  The  conditions  of 
Table  2(c)  (i.e.,  a  *  10  0«O.95)  appear  to  be  most  favorable  for 


performance  under  Hq  of  either  double-check  test.  These  general  conditions 
(i.e.,  very  small  a  and  moderate  (1-0))  are  the  most  prevalent  for  many 
testing  problems  (such  as  that  arising  in  search  radar).  The  antipodal 
conditions  (very  small  (1-0)  and  moderate  3)  are  less  favorable  for 
performance  of  the  double-check  tests.  However,  these  latter  conditions 
carry  the  implication  that  Type  II  errors  are  more  significant  than  Type  I 
errors,  and  thus  that  the  double-check  should  be  performed  only  when  rejecting 
rather  than  when  rejecting  Hq.  This  would  result  in  the  performance 
tabulated  in  Table  2(c)  with  the  roles  of  Hq  and  H^  reversed. 

Of  course,  the  optimum  multistage  test  (in  terms  of  average  sample 
size)  is  the  Wald  sequential  probability  ratio  test  (SPRT).  The  asymptotic 
efficiency  of  SPRT  relative  to  cp  has  been  considered  by  Paulson  (1947)  and 
Bechhofer  (1960),  and  by  combining  their  results  with  (5)  and  (8)  the 
asymptotic  efficiencies  of  9DC^  and 
puted  straightforwardly.  Typical  values  of  the  asymptotic  efficiency  of 
cpQC2  a*  =  3-v/a  and  K  *  2)  relative  to  the  SPRT  are  given  in  Table  3. 

Note  that  these  values  range  from  approximately  36%  to  approximately  65% 
under  Hq  and  from  approximately  117o  to  approximately  36 7«  under  H^.  Thus, 
as  is  expected,  the  SPRT  is  superior  in  performance  to  However,  the 

double-check  tests  are  still  preferable  to  the  SPRT  for  many  applications 
for  several  reasons.  First,  the  double-check  test  is  much  simplier  to 
implement  since  it  requires  at  most  two  comparisons.  Further,  the  thresholds 
of  the  double-check  tests  can  be  set  without  knowledge  of  the  true  value 
of  6  under  H^.  This  is  not  true  of  the  SPRT.  Moreover,  the  SPRT  can  be 


cp  _  relative  to  tne  srKT  can  oe  com- 


less  efficient  than  even  the  fixed-sample-size  test  if  an  incorrect  value 
of  9  is  assumed  (Wald  (1947)).  Finally,  the  maximum  value  of  the  sample 
size  is  finite  for  the  double-check  test,  whereas  the  sample  size  of  the 


4.  A  Nonparametric  Version  of  cpnr 

JJw 

Nonparametric  versions  of  (1;  and  (2)  are  easily  constructed  by  re¬ 
placing  the  linear  statistics  with  (Hq)  distribution-free  statistics.  For 
example,  if  X^  is  replaced  by  sgn(X^)  and  randomization  is  introduced  on 
the  threshold  boundaries,  then  the  tests  of  (1)  and  (2)  become  nonparametric 
for  the  hypothesis  P(X^  <  0)  =  The  power  functions  (for  the  alternative 
P(X^  SO)  =  p  >  J)  of  these  particular  nonparametric  versions  of  (1)  and  (2) 
are  compared  to  that  of  the  fixed-sample-size  sign  test  (i.e.,  (4)  with 
X^  replaced  by  sgn(X^)  and  with  randomization  on  the  threshold  boundary)  in 
Table  4.  Note  that  the  Pitman  asymptotic  relative  efficiencies  (for  location 
testing)  between  these  modified  tests  will  be  the  same  as  those  for  the 
linear  tests  within  mild  regularity  conditions  on  the  distribution  of  the 
(such  as  those  given  in  Noether  (1955)).  It  is  also  noteworthy  that, 
again  within  regularity,  the  asymptotic  location-testing  efficiencies  of 

these  double-check  sign  tests  relative  to  the  linear  test  of  (4)  are 
2 

4f  (O)Var(e^)  times  the  value  computed  from  (5)  and  (8),  where  f  is  the 
probability  density  of  e^.  For  the  case  of  normal  errors  we  have 
4f  (O)Var(e^)  =  2/tt,  and  thus,  whenever  a  value  from  Table  2  exceeds 
tt/2  =  1*57,  the  (nonparametric)  double-check  sign  test  is  more  efficient 
than  the  Neyman-Pearson  test  for  normal  errors. 


5.  k-Stage  Teats 


The  above  analysis  of  the  double-check  tests  of  (1)  and  (2)  can  be 
extended  straightforwardly  to  tests  which  reject  Hq  only  after  k  samples. 

For  example,  consider  the  following  test  based  on  kn^  i.i.d.  observations: 

n.  2n.  kn. 

^  1  j—  —1  1/lc 

1;  if  min{  £  x. ,  I  x .  £  x  32r„/n7<t  (1-or  '  ) 

i-1  1  i=nx+l  1  i=(k-l)n1+l  1 

0;  otherwise 


■ 


(9) 


Note  that  the  case  k*2  is  with  a*  =  Jot  and  K=  1.  In  general  this 

test  distributes  the  Type  I  error  uniformly  over  the  k  samples.  The 
asymptotic  efficiency  under  Hq  of  (9)  relative  to  (4)  is  given  straight¬ 
forwardly  by 


ARE 


0 

kC.FSS 


(!-«)[*“  1(l-a1/k)  -$_1(l-e1/k)]2 


(10) 


and  under  we  have 


ARE 


1 

kC,FSS 


(l-B1/k) (1-a) 
(1-0) (l-a1/k) 


ARE 


0 

kC,FSS  * 


(11) 


Table  5  gives  values  of  the  quantities  of  (10)  and  (11)  for  several  values 
of  ot  and  £  and  for  values  of  k  from  2  to  10.  Note  that  the  addition  of 
stages  improves  performance  under  Hq  to  a  point  (significantly  in  some  cases), 
but  that  there  is  a  diminishing  return  and  even  decreased  performance 
associated  with  larger  numbers  of  stages.  Also  note  that,  as  one  might 
expect,  the  performance  under  H^  degrades  with  increasing  k.  As  a  general 
rule,  it  appears  that  the  addition  of  more  stages  is  helpful  in  those  cases 


6 .  Conclusions 


In  this  paper,  we  have  proposed  and  analyzed  a  potentially  useful 
class  of  multistage  tests.  It  should  be  noted  that,  although  we  have 
considered  only  the  linear  test  statistic  with  normal  errors  and  the  sign 
test  statistic,  the  relative  efficiency  expressions  and  numerical  data  of 
Sections  3  and  5  are  applicable  to  much  broader  classes  of  parametric 
testing  problems,  test  statistics  and  error  distributions,  subject  to  mild 
regularity  conditions  (such  as  those  of  Lai  (1978),  Noether  (1955),  and/or 
Paulson  (1947)).  As  demonstrated  by  the  analysis  of  the  above  sections, 
the  proposed  tests  are  intermediate  in  terms  of  efficiency  to  fixed-sample- 
size  tests  and  sequential  probability  ratio  tests.  Their  implementational 
complexity  and  insensitivity  to  parameter  mismatch,  however,  are  comparable 
to  those  of  fixed-sample-size  tests.  Thus,  the  tests  proposed  here  may  be 
preferable  to  both  the  fixed-sample-size  test  and  the  sequential  probability 
ratio  test  for  many  applications. 
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fHI 

0.220 

0.294 

18.45 

0.487 

0.637 

19.46 

0.754 

0.892 

19.86 

0.919 

0.983 

19.97 

Table  2:  Asymptotic  efficiencies  under  Hq  and  of  the  double-check 
tests  relative  to  the  fixed-sample-size  test.  The  values  in 
parenthesis  are  those  under  H^. 
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Table  2  (continued) 
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'.629) 
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l.Va 
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X/at 
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proc  #1 

proc  #2 

1.916 

1.915 

2  .102 

2  .103 

(-500) 

(.500) 

(-551) 

(•551) 

1.915 

1.918 

2-089 

2.104 

(.662) 

(.663) 

(.724) 

(.729) 

1.872 

1.909 

1.967 

2  .067 

(•771) 

(.786) 

(.808) 

(.849) 

1.784 

1.883 

1.806 

2.004 

(.810) 

(.855) 

(.816) 

(.906) 

0.05,  (1-8)  =  10  u 

1  a*  *Jai 

\  a*m  1-5 Jot 

proc  #1 

proc  #2 

proc  #1 

proc  #2 

1.050 

1.052 

1.023 

1 

(.597) 

(•598) 

(-645) 

1.047 

1.101 

.911 

1 

(-641) 

(•673) 

(.608) 

|  jj 1 ,  j]f| 

.912 

1.142 

.769 

(•597) 

(.748) 

(.542) 

(.845) 

.606 

1.201 

.522 

1.239 

(.449) 

(.890) 

(.407) 

(.964) 

.638 

(.533) 

.555 

(.474) 

.399 

(.355) 


1.151 

(.940) 

1.175 

(.981) 

1.165 

(.995) 

1.123 

(1.000) 


(•  171) 

.612 

(.110) 


io‘3 

10-4 

.580 

.490 

(.341) 

(.343) 

.598 

.527 

(.238) 

(.249) 

.611 

.555 

(.182) 

(.196) 

.630 

.593 

(.126) 

(.140) 

.356 

(-355) 

.404 

(.270) 

.438 

(.220) 

.489 

(.164) 


Tab le  3 :  Asymptotic  efficiencies  of  Procedure  2  double-check  test 
relative  to  the  sequential  probability  ratio  test  under 
Hq  and  H^«  The  values  in  parentheses  are  those  under  H^. 
The  double-check  test  has  a*  =  3y/a  and  K  =  2 . 


M  -  10  M  -  15  M  *  20  eL(p)  P2(p)  E(N|p)  ^(p)  P2(p)  E  (n|  p) 


.072  0.062  0.071  15-50 


0.874  0.966  0.894  0.965  19-95 


a**  0.2 


•01  0.01  10-40 

•064  0-070  12-63 

.240  0.276  15-40 

.578  0-652  18-01 

•910  0-949  19-64 


Power  functions  of  fixed-sample-size  sign  tests  and  double-check 
sign  tests.  The  value  p  is  the  probability  of  having  a  positive 
observation.  0^(*)  and  g2(*)  are  f°r  double-check  procedures 
1  and  2,  respectively.  The  significance  level  is  a  m  0.01. 


.vv>v-.-*v •-*L  %■>•*/. 


r- . 


t- 


n  t 

*  ■>  Number 

>  .  of  stages 

(k) 

a=l-0=io"2 

a=l-0=lO-4 

_ 

a=l-P=10-6 

or =10  2 
1-0=1O"6 

-6 

a=10 

1-0=0.  1 

r.' 

> . 

> 

1.323 

1.417 

1.196 

1-631 

(.730) 

(.709) 

(-658) 

(-838) 

o 

1.401 

1.680 

1.198 

2-099 

(-591) 

(.566) 

(.504) 

(•732) 

.  A 

1.386 

1.748 

1.847 

1.143 

2.442 

t 

(.504) 

(.486) 

(.477) 

(.414) 

(•655) 

z: 

1.281 

1.781 

1.995 

1.006 

2.849 

o 

(.396) 

(.378) 

(.369) 

(.310) 

(•551) 

8 

■  » 

10 

1.166 

(.331) 

1.718 

(.314) 

2.009 

(.305) 

.886 

(.251) 

3.023 

(.481) 

1.064 

1.628 

1.964 

.789 

3-072 

(.287) 

(.270) 

(.262) 

(.212) 

(.430) 

r  k 


Table  5:  Asymptotic  efficiencies  of  k-stage  test  relative  to  the  fixed- 

sample-size  test  with  various  error  probabilities  a  and  1-0. 

The  k-stage  test  has  an  equal  number  of  samples  in  each  stage 

1/k 

and  the  significance  level  at  each  stage  is  a 
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